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DETAILED ACTION 

1 . This office action is in response to correspondence filed 08/21/08 regarding 
application 10/554010, in which claims 1-10 were amended, claims 11-12 were 
cancelled, and new claim 13 was added. Claims 1-10 and 13 are pending in the 
application and have been considered. 

Response to Arguments 

2. Applicant's amendment to claim 6 overcomes the rejection under 35 U.S.C. 1 1 2, 
and so the rejection is withdrawn. 

3. Since applicant has cancelled claim 1 1 , the rejection under 35 U.S.C. 1 01 of this 
claim is moot. 

4. Applicant's arguments with respect to claims 1-10 and 13 on pages 8-10 of the 
Remarks have been considered but are moot in view of the new ground(s) of rejection. 

Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

6. Claims 1-4, 7-10 and 13 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Blum et al. (5,918,223) in view of Sheirer et al. ("Construction and 
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Evaluation of a Robust Multifeature Speech/Music Discriminator". Proceedings of the 
1997 IEEE International Conference on Acoustics, Speech, and Signal Processing 
(ICASSP '97), Vol. 2, p1 331 -1334). 

Consider claims 1 and 8, Blum discloses a system and method for classifying at 
least one audio signal (Abstract lines 3-4, the analysis of audio data can be used to 
classify) into at least one audio class (Abstract lines 7-8, classes of audio files) by: 

analyzing said audio signal to extract at least one predetermined audio feature 
(Abstract lines 1-4, analysis... of audio files produces a set of feature vectors); 

performing a frequency analysis on a set of values of said extracted 
predetermined audio feature at different time instances resulting in a magnitude 
spectrum of said extracted predetermined audio feature (Col 15 lines 43-44, bass 
spectrum, which represents the bass trajectory at different time instances, is subjected 
to an FFT); 

deriving at least one further audio feature representing a temporal behavior of 
said extracted predetermined audio feature by parameterizing said magnitude spectrum 
(Col 15 lines 50-60, beats detected from magnitude peaks representing a temporal 
behavior); and 

classifying said audio signal based on said further audio feature (Col 21 lines 
53-65, the signal is classified into categories using statistical measures derived from the 
feature vectors). 

While Blum discloses a magnitude spectrum, Blum does not specifically mention 
a power spectrum. 
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Sheirer discloses a power spectrum (p1131 Col 2). 

It would have been obvious to one of ordinary skill in the art at the time of the 
invention to modify the invention of Blum by using spectral power instead of spectral 
magnitude, which would aid in peak detection by enhancing contrast between spectral 
peaks and valleys. 

With respect to claim 2, Blum discloses that at least one predetermined audio 
feature comprises at least pitch (Col 6 lines 45-47). 

With respect to claim 3, Blum discloses the predetermined audio feature 
comprises at least one Mel-frequency cepstral coefficient (Col 6 lines 45-47). 

With respect to claim 4, Blum discloses the predetermined audio feature 
comprises at least sharpness (Col 6 lines 45-47, brightness). 

With respect to claim 7, Blum discloses at least one further audio feature is 
defined as at least one coefficient obtained by performing a discrete cosine 
transformation on the result of a frequency analysis (Col 13 lines 15-17, 32-34, the 
MFCCs are obtained by performing a discrete cosine transform on the FFT result). 

With respect to claim 9, Blum discloses music system comprising: means for 
playing audio data from a medium, (Col 5 lines 29-40) in addition to a system as 
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claimed in claim 8 for classifying said audio data (See claim 1). 

With respect to claim 10, Blum discloses a multi-media system (Fig 1) 
comprising: means for playing audio data (Fig 1 Ul Adapter 124) from a medium (Fig 1 
ROM 104); a system as claimed in claim 8 for classifying said audio data (See claim 1); 
means for displaying video data from a further medium; (Fig 1 Display Adapter 126) 
means for analyzing said video data (Fig 1 CPU 102); and 

means for combining the results obtained from analyzing said video data with the 
results obtained from classifying said audio data (Fig 1, CPU 102 and Display Adapter 
126, the results would be combined and presented on the display by these means). 

With respect to claim 13, Blum does not specifically mention a log power 
spectrum of said extracted predetermined audio feature. 

Sheirer discloses a log power spectrum of said extracted predetermined audio 
feature (p1131 Col 2, perceptual channels energy). 

It would have been obvious to one of ordinary skill in the art at the time of the 
invention to modify the invention of Blum such that performing a frequency analysis on a 
set of values of said extracted predetermined audio features at different time instances 
results in a log power spectrum of said extracted predetermined audio feature, which 
would be a more useful metric for classifying sounds intended to be heard by humans, 
who have a logarithmic perception of sound. 
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7. Claims 5 and 6 are rejected under 35 U.S.C. 1 03(a) as being unpatentable over 
Blum et al. (5,91 8,223) in view of Sheirer et al. ("Construction and Evaluation of a 
Robust Multifeature Speech/Music Discriminator". Proceedings of the 1997 IEEE 
International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), 
Vol 2, p1 331 -1334), in further view of Repp ("Quantitative Effects of Global Tempo on 
Expressive Timing in Music Performance: Some Perceptual Evidence. Music 
Perception, Fall 1995, Vol. 13 No. 1, p39-57). 

With respect to claim 5, Blum discloses the deriving step comprises the steps of: 

calculating an average value of said set of values of said extracted 
predetermined audio feature at different time instances (Col 15 lines 43-44, taking an 
FFT produces frequency coefficients, the lowest of which is the DC value, or time 
average, of the signal for the given frame); 

defining at least one frequency band (Col 15 lines 43-44, taking an FFT defines 
at least one frequency bin); 

calculating the amount of energy within said frequency band from said frequency 
analysis (Col 15 lines 43-44, taking an FFT calculates coefficients representative of 
the amount of energy in each frequency bin); and 

defining said further audio feature as said amount of energy (Col 15 lines 44- 

46). 

Blum and Shierer do not specifically mention defining said further audio feature 
as said amount of energy divided by said average value. 
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Repp discloses defining a audio feature as an amount of energy divided by an 
average value (p41, calculation of relative modulation depth requires dividing energy by 
an average value). 

It would have been obvious to one of ordinary skill in the art at the time of the 
invention to modify the invention of Blum and Sheirer by defining said further audio 
feature as said amount of energy divided by said average value, in order to make the 
feature more robust to tempo fluctuations, as suggested by Repp (p41). 

With respect to claim 6, Blum discloses at least one of the following frequency 
bands are used in said frequency analysis: 1-2 Hz; 3-15 Hz; and 20-150 Hz (Col 6 lines 
65-67, and Col 15 lines 43-44, at the sampling rates disclosed, at least these 
frequency bands would be represented by the FFT spectrum). However, Blum does not 
specifically disclose modulation frequencies. 

Scheirer et al. disclose an audio classifier in which the 4Hz modulation frequency 
energy of the signal is analyzed (p1 131 Section 2). It was well known to those skilled in 
the art at the time of the invention that speech tends to have more modulation energy at 
4Hz than music does (See Scheirer p1131 Section 2). 

It would have been obvious to try a 3-1 5Hz modulation frequency parameter as a 
feature in Blum's invention for the following reasons: there was a recognized need in the 
field to develop better classification features (See Scheirer p1131 Section 1); there 
were a finite number of identified, predictable ranges that would include the well known 
4Hz frequency; one of ordinary skill could have readily pursued the known ranges with a 
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reasonable expectation of success; and one of ordinary skill could have used readily 
available software to modify the parameter range. 

Conclusion 

8. Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 

§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

9. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Jesse Pullias whose telephone number is 
571/270-5135. The examiner can normally be reached on M-F 9:00 AM - 4:30 PM. If 
attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on 571/272-7843. The fax phone number 
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for the organization where this application or proceeding is assigned is 571/270-6135. 

10. Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for published 
applications may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through Private PAIR only. For 
more information about the PAIR system, see http://pair-direct.uspto.gov. Should you 
have questions on access to the Private PAIR system, contact the Electronic Business 
Center (EBC) at 866-217-9197 (toll-free). 

/Jesse S. Pullias/ 
Examiner, Art Unit 2626 
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/Talivaldis Ivars Smits/ 
Primary Examiner, Art Unit 2626 



